Slotted memory access method

ABSTRACT

A method of accessing a shared memory store at a multiported data network node is provided. The method provides for a deterministic access schedule to be used in apportioning processing bandwidth between data ports and bus connected devices used in processing conveyed data. Advantages are derived from eliminating data processing latencies otherwise incurred from: data bus arbitration related to handshaking, arbitration request processing, and switching between read and write memory access cycles.

[0001] This application claims the benefit of U.S. Provisional Application No. 60/236,165, filed Sep. 29, 2000.

FIELD OF THE INVENTION

[0002] The invention relates to data switching, and in particular to methods of scheduling access to a shared memory store in switching Protocol Data Units (PDUs) at a data switching node.

BACKGROUND OF THE INVENTION

[0003] In the field of data switching, the performance of data switching nodes participating in data transport networks is of outmost importance.

[0004] Data switching nodes are multi-ported data network nodes which forward Protocol Data Units (PDUs) between input data ports and output data ports.

[0005] The basic operation of a data switching node includes: receiving at least one PDU, storing the PDU in memory while an appropriate output port is determined, scheduling the PDU for transmission, and transmitting the PDU via the determined output port. This is know in the industry as a “store and forward” method of operation of data switching nodes. Although in principle the operation of data switching nodes is simple, the implementation of such devices is not trivial.

[0006] PDUs include and are not limited to: cells, frames, packets, etc. Each PDU has a size. Cells have a fixed size while frames and packets may vary in size. Each PDU has associated header information having specifiers holding information used in forwarding the PDU towards a destination data network node in the associated data transport network. The header information is consulted at data switching nodes in determining an output port via which to forward the PDU towards the destination data network node.

[0007]FIG. 1 is a schematic diagram showing a general design of a data switching node 100. The data switching node 100 is a multi-ported device storing PDUs in a shared memory buffer 104, and forwarding PDUs between N physical ports 102.

[0008] Each physical port 102 is adapted to receive and transmit data via an associated physical link 118 as shown at 112. Each physical port 102 has an associated physical data transfer rate.

[0009] Although the physical ports 102 may receive and transmit continuously, the data has a structure defined by the PDUs conveyed. In the absence of physical data to convey, the physical ports 102 can be configured to convey empty PDUs in accordance with the specification of the data transfer protocol(s) employed by the physical port 102. Data transfer rates below the physical data transfer rate of a physical port 102 may also be obtained by inserting empty PDUs in the data stream conveyed therethrough. Data transfer rates above the physical data transfer rate of a physical port 102 can be obtained through inverse multiplexing over a plurality of physical ports 102, the plurality of physical ports 102 collectively defining a logical data port (not shown). Physical ports 102 can be configured to convey data adhering to more than one data transfer protocol.

[0010] Each physical port 102 has access to the shared memory buffer 104 via a data bus 106 in a coordinated manner enforced by an arbiter 108. Also accessing the shared memory buffer 104 is a PDU classifier 110. The PDU classifier 110 operates on header information of PDUs pending processing.

[0011] The PDU classifier 110 inspects a body of routing information (switching database) also stored in the shared memory buffer 104 and in so doing requires a portion of the bandwidth on the data bus 106. The sharing of data storage resources for PDU buffering and for storing routing information consolidates the memory storage requirements and simplifies the design of the data switching node 100 leading to reduced implementation costs.

[0012] The data bus 106 also has an associated physical data transfer rate which is typically different from the data transfer rates of the physical ports 102. In order to match data transfer rates between the physical ports 102 and the data bus 106, the conveyed PDUs are buffered in receive 114 and transmit 116 buffers. Physical port 102 designs having an adjustable physical data transfer rate exist. As such, the data switching node 100 besides being multi-ported, can be configured to accommodate interfaces having varied physical data transfer rates while adhering to multiple data transfer protocols.

[0013] The core of the data switching node 100 includes processes and hardware enabling the conveyance of PDUs between receive buffers 114, shared memory buffer 104 and transmit buffers 116. A design requirement is that the data switching node 100 be able to process and convey PDUs such that all physical ports 102 receive and transmit simultaneously at their full physical transfer rates.

[0014] Once received (112) over a physical link 118, a PDU is buffered in the receive buffer 114 associated with the input physical port 102. A switching process implemented by the data switching node 100 determines at least one output port 102 via which to forward the PDU towards its destination data network node. The PDU classifier 110 makes a determination whether the PDU is a unicast PDU or a multicast PDU. Multiple output ports 102 may be determined for multicast PDUs. Once processed and at least one output physical port 102 determined, the PDU is buffered in the corresponding transmit buffer(s) 116 awaiting transmission over the corresponding physical link(s) 118.

[0015] In servicing a receive buffer 114, the corresponding physical port 102 uses a corresponding Direct Memory Access (DMA) device 120 to access (122) the shared memory buffer 104 via the data bus 106. The DMA device 120, upon gaining access to the shared memory buffer 104, writes at least one PDU thereto for further processing.

[0016] The PDU classifier 110 also uses another corresponding DMA device 120 to access (124) the shared memory buffer 104 in processing pending PDUs. The PDU classifier 110 as mentioned above, makes use of the header information associated with each PDU and consults (124) routing information in determining the output port(s) 102 to forward the PDU to.

[0017] As a side effect of processing PDUs, the PDU classifier 110 may also modify (124) the body of routing information held in the shared memory buffer 104 in which case the DMA device 120 writes (124) to the shared memory buffer 104. Modifications of the routing information are necessary in establishing new data transport routes in the data transport network for data sessions and the associated data transfers.

[0018] Once at least one output port 102 is determined for a PDU pending processing in the shared memory buffer 104, a DMA device 120 corresponding to at least one of the determined output ports 102 loads (122) the PDU from the shared memory buffer 104 into the corresponding transmit buffer 116 where the PDU awaits transmission over the physical link 118. If the PDU is to be multicasted, additional DMA devices 120 corresponding to the other determined output ports 102 load (122) the PDU into corresponding transmit buffers 116.

[0019] As the DMA devices 120 are used to perform PDU data transfers 122 as well as data transfers 124 in accessing routing information, the DMA devices 120 contend for the data bus 106. The DMA devices 120 communicate with the arbiter 108 via an access control bus 128. Typically the access control bus 128 is integral to the data bus 106.

[0020] The arbiter 108 enforces a controlled access 130 to the data bus 106 in coordinating the access to shared memory buffer 104 for all DMA devices 120. Coordinating the access to the data bus 106 efficiently for all DMA devices 106 is essential in attaining efficient performance of the data switching node 100.

[0021] Methods of coordinating access to the data bus 106 for multiple DMA devices 120 exist and provide a variety of advantages.

[0022] Typically round-robin or weighted round-robin data bus access arbitration techniques are used in scheduling access to the shared memory buffer 104 via the data bus 106. In accordance with round-robin data bus arbitration techniques, the DMA devices 120 issue requests (126) for memory access cycles via the access control bus 128 to the arbiter 108. The arbiter 108 issues grant responses (126) to the DMA devices 120 to take over the data bus 106 for a memory access cycle to perform read, write, and modify operations on the shared memory buffer 104. The request for a memory access cycle and the grant response is known in the field as request-grant handshaking.

[0023] The need for request-grant handshaking is a disadvantage limiting data switching performance of the data switching node 100. There is an overhead incurred in the request-grant handshaking process: in receiving a request and issuing a grant response as well as in processing each request. The handshaking steps themselves use clock cycles in conveying requests and grant responses over the access control bus 128 leading to increased memory access cycle times i.e. longer reads, longer writes, and longer modifies.

[0024] There is an unpredictability in the waiting time between the request and the grant response which is found in field trials to favor large data transfer bursts. With large data transfer bursts being favored, data switching node designs tend to require the use of large receive buffers 114 and large transmit buffers 116. Field trials show that using small buffers leads to data over-run and data under-run errors in conveying PDUs over the physical ports 102.

[0025] The unpredictability of the handshake turnaround time is therefore detrimental to the turnaround time in processing PDUs at the data switching node 100 since the access by the PDU classifier 110 to the data bus 106 tends to be short, random and frequent in comparison with PDU data transfers. Therefore PDU data transfers are favored in comparison to switching database accesses while being delayed in processing. This may lead to a condition in which an unnecessarily large number of PDUs are awaiting processing in the shared memory buffer 104.

[0026] Moreover, favoring large data transfers has a detrimental effect in data switching environments in which PDUs have variable sizes by favoring the conveyance of large size PDUs to and from the shared memory buffer 104. This adds a delay in processing comparatively short PDUs. The effects of the incurred delay compound considering the fact that shorter PDUs have a larger processing overhead to PDU size ratio thereby delaying larger processing overheads.

[0027] Round-robin techniques are found to be well suited for arbitrating data bus access for the conveyance of fixed size PDUs at data switching nodes 100 having physical ports 102 conveying data at the same physical data transfer rate. However the trends in the field of data switching call for greater port densities per data switching node 100 and greater port diversity at the data switching node 100 both in physical data transfer rates and data transfer protocols supported.

[0028] A prior art International Publication, Number WO 00/72524 A1, entitled “APPARATUS AND METHOD FOR PROGRAMMABLE MEMORY ACCESS SLOT ASSIGNMENT”, filed on Dec. 17, 1999 by Yu et al., describes a memory access scheme in which the granting of memory access requests is pre-specified. Although the handshake turnaround time is reduced, memory bandwidth utilization is not very efficient because memory access request processing is still required.

[0029] Therefore there is a need to provide methods of efficiently scheduling access to the data bus 106 in enhancing the performance of the data switching node 100.

SUMMARY OF THE INVENTION

[0030] In accordance with an aspect of the invention, a data network node processing data is presented. Components of the data network node includes a divided aggregate data bus, a divided shared memory store accessed via the aggregate data bus and a plurality of data bus connected devices. A deterministic data bus arbitration schedule is used to apportion an aggregate bandwidth of the aggregate data bus to the plurality of data bus connected devices.

[0031] In accordance with another aspect of the invention, the deterministic data bus arbitration schedule specifies the grouping of read memory access cycles sequentially and the grouping of write memory access cycles sequentially such that changes between read memory access cycles and write memory access cycles is reduced.

[0032] In accordance with yet another aspect of the invention, a method of arbitrating access to a divided aggregate data bus for a plurality of data bus connected devices is presented. The method includes a sequence of steps. Each stream of data to be conveyed via the aggregate data bus into an aggregate stream of data granules. The access to the aggregate data bus is coordinated according to a deterministic data bus arbitration schedule. The aggregate stream of data granules is conveyed over the aggregate data bus in accordance with the deterministic data bus arbitration schedule.

[0033] Data processing latencies in arbitrating the access to the aggregate data bus for the plurality of data bus connected devices are reduced.

[0034] The advantages are derived from eliminating latencies otherwise incurred from: data bus arbitration related handshaking, arbitration request processing in scheduling access to the data bus, and switching between read and write memory access cycles.

BRIEF DESCRIPTION OF THE DRAWINGS

[0035] The features and advantages of the invention will become more apparent from the following detailed description of the preferred embodiments with reference to the attached diagrams wherein:

[0036]FIG. 1 is a schematic diagram showing a general design of a data switching node;

[0037]FIG. 2 is a schematic diagram showing a detail of an access schedule for a shared memory buffer in accordance with an exemplary embodiment of the invention;

[0038]FIG. 3 is a schematic diagram showing an exemplary design architecture of a data switching node in accordance with an exemplary embodiment of the invention;

[0039]FIG. 4 is a schematic diagram showing a detail of the architecture of a data switching node in accordance with an exemplary implementation of the preferred embodiment of the invention;

[0040]FIG. 5 is a schematic diagram showing a detail of the architecture of a data switching node in accordance with another exemplary implementation of the preferred embodiment of the invention;

[0041]FIG. 6 is a schematic diagram showing a detail of the architecture of a data switching node in accordance with another exemplary implementation of the preferred embodiment of the invention;

[0042]FIG. 7 is a schematic diagram showing a detail of the architecture of a data switching node in accordance with another exemplary implementation of the preferred embodiment of the invention;

[0043]FIG. 8 is a schematic diagram showing details of an exemplary memory access schedule in accordance with an exemplary embodiment of the invention; and

[0044]FIG. 9 is a schematic diagram showing a data switching node having a plurality of data bus connected devices in accordance with yet another exemplary implementation of the preferred embodiment of the invention.

[0045] It will be noted that in the attached diagrams like features bear similar labels.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0046] Several parameters are important in designing data switching nodes.

[0047] A minimum memory bandwidth requirement is imposed on the shared memory buffer 104. The minimum memory bandwidth B of the shared memory buffer 104 has to accommodate all N physical ports 102 transmitting and receiving data simultaneously at their full physical data transfer rates without discarding PDUs, as well as, the bandwidth C required by the PDU classifier 110 in processing PDUs. If the physical ports 102 convey data at different rates, then the minimum theoretical bandwidth required is given by: ${B = {C + {2{\sum\limits_{j = 0}^{N - 1}r_{j}}}}},$

[0048] where r_(j) represents the physical data transfer rate of port j.

[0049] Field trials show that the minimum bandwidth required has to be greater than the theoretical estimate due to overheads incurred in the request-grant handshaking, overheads incurred in changing between read and write memory cycles, etc. The more frequent switching between read and write memory cycles the less efficient the memory bandwidth utilization.

[0050] In accordance with the preferred embodiment of the invention, a slotted memory access scheme is used to arbitrate the access to the data bus 106 and therefore to the shared memory buffer 104.

[0051]FIG. 2 is a schematic diagram showing a detail of an access schedule (200) to a shared memory buffer (104) in accordance with an exemplary embodiment of the invention.

[0052] The data bus arbitration schedule 200 is divided into time frames 202. Each time frame 202 being further subdivided into time slots 204. A deterministic data bus arbitration schedule 200 is enforced. Read, write, and modify memory access cycles are performed during time slots 204. Memory access cycles are assigned to DMA devices 120 and it is left up to each individual DMA device 120 to make use of the assigned time slot(s) 204. This removes the necessity of sending/receiving memory access requests and therefore eliminating the overhead involved in conveying and processing thereof, leading to a more efficient use of the memory bandwidth B.

[0053] The length of the time frame 202 is left up to design choice. Using more time slots 204 per time frame 202, a more granular control can be effected as each time slot 204 represents a smaller percentage of the time frame 202.

[0054] The memory bandwidth B can therefore be partitioned effectively to match the data transfer rate requirements of data bus connected devices including each one of the physical ports 102 and the PDU classifier 110. The need to overbudget the memory requirements for the receive 114 and transmit 116 buffers is reduced thereby reducing implementation costs.

[0055] The partitioning of the memory bandwidth may be set explicitly via a management console, chosen from a group of pre-set bus arbitration schedules, actively monitored and managed by a higher layer protocol monitoring PDU data flows, etc. without limiting the invention. An optimum memory access schedule can be determined through gathering statistics on memory access cycles and the utilization of processing queues at the data switching node.

[0056] The deterministic arbitration schedule provides a guaranteed bandwidth to the PDU classifier 110 and enables the PDU classifier 110 to process pending PDUs in a timely manner reducing the occurrence of PDU processing backlogs in the shared memory buffer 104.

[0057] Further, in accordance with the preferred embodiment of the invention, with the bandwidth requirement fulfilled, the deterministic memory access schedule enables the grouping of all read memory access cycles and all write memory access cycles within each time frame 202. Further advantages are derived from reducing the occurrence of changes between read and write memory access cycles thereby further reducing PDU processing latencies.

[0058]FIG. 3 is a schematic diagram showing an exemplary design architecture of a data switching node in accordance with the invention.

[0059] Shown in the diagram is data switching node 300 having arbiter 308 implementing the memory access scheme presented above with respect to FIG. 2. The arbiter 308 coordinates 326 the access of the DMA devices 120 to the data bus 106 by pacing repeatedly through the access schedule specified via the time frame 202—in a cyclical fashion. The use of a deterministic access schedule greatly simplifies the design of the data switching node 300.

[0060] Another important parameter in designing data switching equipment is the width W of the data bus 106. Theoretically, if the clock frequency of the data bus 106 is F, then the required data bus width W is given by: $W \geq {\frac{B}{F}.}$

[0061] Increasing the clock frequency F of the data bus: is limited by the access speed of the shared memory buffer 104, increases cross-talk in the bus lines, reduces the synchronization window for signals on the data bus 106 (more stringent requirements on signal propagation over the bus lines), etc.

[0062] In practice it becomes very hard to design wide data buses while increasing the port density per data switching node. The practical limitations include: an increase in complexity in routing bus lines, a loss of synchronization between the signals conveyed over the bus lines (unevenly long bus lines), increased cross-talk, increased power drain, to name a just a few.

[0063] Very wide data buses reduce the efficiency of the PDU classifier 110. While PDU sizes are typically larger than the width W of the data bus, switching database accesses can be significantly narrower. As an example switching database accesses can be 48 bits wide while a wide data bus can be W=128 bits wide, each PDU classifier access would only utilize 37.5% of the available bandwidth B therefore representing a large processing overhead per access.

[0064]FIG. 4, FIG. 5, FIG. 6, FIG. 7 are schematic diagrams showing details of the architecture of data switching nodes in accordance with exemplary implementations of the preferred embodiment of the invention.

[0065] Common to all these exemplary implementations of data switching nodes 400, 500, 600 and 700 are the use of a shared memory space divided into blocks 404 with each shared memory block 404 being accessed by each of the N respective ports 402/602 via separate data buses 406. The designs show the shared memory space spanning over two memory blocks 404. The invention is not limited thereto, the use of more memory blocks 404 and corresponding data buses 406 being a design choice based on data switching equipment performance requirements balanced against implementation costs.

[0066] In optimizing the use of the available memory bandwidth B, the width w of each individual data bus 406 is chosen to reduce bandwidth loss in accessing the routing information.

[0067] The switching database access tends to be narrow and frequent. If for example, data transfers associated with the switching database accesses are 48 bits wide, using a single W=w=32 bit bus, one switching database access takes two clock cycles. One clock cycle is used for a single W=w=64 bit bus. The total bandwidth used is the same. However, if the width of the single data bus is increased to 128 bits, the switching database access will still take one clock cycle wasting 62.5% of the bandwidth per access. Therefore wider single buses are only efficient for long, continuous streams of data such as PDU transfers.

[0068] By dividing the effective aggregate data bus width W of the aggregate data bus into two W=2w=128 bits, the switching database access only takes one clock cycle on one of the two constituent data buses 406 w=64 bits wasting only 12.5% of the bandwidth per switching database access.

[0069] The choice of multiple constituent data buses 406, each of a data bus width w, maintains the complexity thereof while increasing the effective aggregate data bus width W without increasing the clock frequency F of each data bus 406 to provide increased effective aggregate bandwidth B.

[0070] Further, in accordance with the preferred embodiment of the invention, a DMA device 420 is used for each data transport direction for each physical port 402. Two DMA devices 420 per physical port 402 provide simultaneous access 422/424 to both of the data buses 406.

[0071] In arbitrating the access to the shared memory blocks 404, the data switching node 400 makes use of a single arbiter 408 associated with a single access control bus 428 exercising a controlled access 430 over each one of the data buses 406 by coordinating (426) DMA device 420 access thereto.

[0072] In arbitrating the access to the shared memory blocks 404, the data switching node 500 makes use of two arbiters 508 each associated with corresponding access control buses 528, each one of the arbiters 508 exercising controlled access 530 over a single corresponding data bus 406 coordinating (526) DMA device 420 access thereto.

[0073] As coordinated by arbiters 408/508 each one of the DMA devices 420 accesses 422/424 a data bus 406 during assigned time slots 204.

[0074] While each one of the physical ports 402 can be given simultaneous access to both data buses 406, for the data switching nodes 400 and 500, each physical port 402 is constrained to perform simultaneously one read operation and one write operation. More details regarding the data bus arbitration schedule will be presented below with reference to FIG. 8.

[0075] The restriction mentioned above is eliminated in the implementations of data switching nodes 600 and 700, where multiple DMA devices 620 are associated with receive 614 and transmit 616 buffers. The exemplary implementations shown make use of two DMA devices 620 per each receive buffer 614 and each transmit buffer 616. The invention is not limited thereto—the number of DMA devices 620 associated therewith being a function of the number of data buses 406 used, design requirements and associated implementation costs.

[0076] As shown, each DMA device 620 is granted access 426/726 to the data bus 406 associated therewith as directed by arbiters 408/508 and conveys data 622/624 therebetween.

[0077]FIG. 8 is a schematic diagram showing details of an exemplary memory access schedule in accordance with an exemplary embodiment of the invention.

[0078] A data bus arbitration schedule 800 has two time lines 810 each of which corresponds to one of the two data buses 406. Time frames 202 are cyclically paced through in assigning time slots 204 to DMA devices 420 to perform memory access cycles.

[0079] In accordance with an exemplary implementation of the invention an exemplary partitioning of the bandwidth B is presented in Table 1. The data bus arbitration schedule 800 is specified by a 128 time slot time frame 202 having time slot 204 assignments for DMA devices 420 connected to the two data buses 406. The exemplary time slot assignment corresponds to a data switching node 400/500 having 24—10/100 Mbit/s physical ports and two —1 Gbit/s physical ports. The read and write memory access cycles are sequenced into 5 groups.

[0080] As shown in FIG. 4, FIG. 5, FIG. 6 and FIG. 7, and Table 1, the shared memory blocks 404 are labeled as “odd” and “even”; one simple mode of operation of the data switching nodes 400/500/600/700 presented herein in accordance with the preferred embodiment of the invention, includes dividing PDUs into an aggregate stream of PDU granules each of which having a size equal to the width w of each constituent data bus 406. Odd PDU granules in the aggregate stream of granules represent an odd constituent stream of granules and are stored in the odd shared memory block 404. Similarly, even PDU granules in the aggregate stream of granules represent an even constituent stream of granules and are stored in the even shared memory block 404.

[0081] If each data bus 406 has a width w of 64 bits while the aggregate width W of the data bus is 128 bits, this leads to an 8 byte PDU granule size. Each PDU granule is conveyed during a single clock cycle over the w=64 bits wide data bus 406.

[0082] The 128 time slot time frame 202 contains one read and one write memory access cycle for each 100 Mb/s port 402 to each one of the two shared memory blocks 404. 1 Gb/s ports 402 are assigned 12 read and write memory access cycles per shared memory block 404 per time frame 202 such that between any consecutive same type memory access cycle to one shared memory block 404 there exists a same type memory access cycle to the other shared memory block 404. The two extra time slots ascribed to the 1 Gb/s ports 402 prevent potential bandwidth loss due to state machine recovery after receiving an end of PDU marker (EOF). Therefore the reading and writing from alternating shared memory blocks 404 can be performed with minimal loss of memory bandwidth B.

[0083] In accordance with a worst case scenario, for any physical port 402, a 1 byte long EOF granule is conveyed to and from the odd shared memory block 404. This means that 7 bytes of the odd shared memory block 404 bandwidth remain idle and furthermore, the next 8 bytes of bandwidth allocated on the even shared memory block 404 will also be idle since the next PDU starts being conveyed to or from the odd shared memory block 404. In this scenario a maximum of 15 bytes of bandwidth seem to be wasted per PDU transfer. In processing PDUs defined by data transfer protocols such as the IEEE 802.x Ethernet Standard, this bandwidth loss is in fact acceptable since an inter-PDU gap of 20 bytes is specified.

[0084] An exemplary partitioning of the available bandwidth is presented in Table 1: TABLE 1 Exemplary dual bus arbitration schedule Odd Even No. Block Block 0 SE(0) W(14) 1 SE(0) R(15) 2 SE(0) R(G0) 3 R(0) R(16) 4 R(G0) R(G1) 5 R(1) R(17) 6 R(G1) R(G0) 7 R(2) R(18) 8 R(G0) R(G1) 9 R(3) R(19) 10 R(G1) R(G0) 11 R(4) SE(1) 12 R(G0) SE(1) 13 T SE(1) 14 T T 15 W(0) T 16 W(G0) W(G1) 17 W(1) W(15) 18 W(G1) W(G0) 19 W(2) W(16) 20 W(G0) W(G1) 21 W(3) W(17) 22 W(G1) W(G0) 23 W(4) W(18) 24 W(G0) W(G1) 25 SE(2) W(19) 26 SE(2) R(G1) 27 SE(2) R(20) 28 R(G1) R(G0) 29 R(5) R(21) 30 R(G0) R(G1) 31 R(6) R(22) 32 R(G1) R(G0) 33 R(7) R(23) 34 R(G0) R(G1) 35 R(8) SE(3) 36 R(G1) SE(3) 37 R(9) SE(3) 38 RSE R(24) 39 T T 40 T T 41 W(5) W(20) 42 W(G1) W(G0) 43 W(6) W(21) 44 W(G0) W(G1) 45 W(7) W(22) 46 W(G1) W(G0) 47 W(8) W(23) 48 W(G0) W(G1) 49 WSE WSE 50 W(G1) W(G0) 51 SE(4) W(24) 52 SE(4) R(0) 53 SE(4) R(G0) 54 RMG R(1) 55 R(G0) R(G1) 56 R(10) R(2) 57 R(G1) R(G0) 58 R(11) R(3) 59 R(G0) R(G1) 60 R(12) R(4) 61 R(G1) R(G0) 62 R(13) SE(5) 63 R(G0) SE(5) 64 R(14) SE(5) 65 T RMG 66 T T 67 WMG T 68 W(G0) W(G1) 69 W(9) W(0) 70 W(G1) W(G0) 71 W(10) W(1) 72 W(G0) W(G1) 73 W(11) W(2) 74 W(G1) W(G0) 75 W(12) W(3) 76 W(G0) W(G1) 77 W(13) W(4) 78 SE(6) WMG 79 SE(6) R(5) 80 SE(6) R(G1) 81 R(15) R(6) 82 R(G1) R(G0) 83 R(16) R(7) 84 R(G0) R(G1) 85 R(17) R(8) 86 R(G1) R(G0) 87 R(18) R(9) 88 R(G0) R(G1) 89 R(19) SE(7) 90 R(G1) SE(7) 91 T SE(7) 92 T T 93 W(14) T 94 W(G1) W(G0) 95 W(15) W(5) 96 W(G0) W(G1) 97 W(16) W(6) 98 W(G1) W(G0) 99 W(17) W(7) 100 W(G0) W(G1) 101 W(18) W(8) 102 W(G1) W(G0) 103 SE(8) W(9) 104 SE(8) R(10) 105 SE(8) R(G0) 106 R(20) R(11) 107 R(G0) R(G1) 108 R(21) R(12) 109 R(G1) R(G0) 110 R(22) R(13) 111 R(G0) R(G1) 112 R(23) R(14) 113 R(G1) SE(9) 114 R(24) SE(9) 115 T SE(9) 116 T RSE 117 W(24) T 118 W(19) T 119 W(G0) W(G1) 120 W(20) W(10) 121 W(G1) W(G0) 122 W(21) W(11) 123 W(G0) W(G1) 124 W(22) W(12) 125 W(G1) W(G0) 126 W(23) W(13) 127 WSE WSE

[0085] where: R(port designation) represents the read memory access cycle performed by TxDMA(port designation) device 420, W(port designation) is the write memory access cycle performed by RxDMA(port designation) device 420, SE(#)'s are the memory access cycles of the PDU classifier 110 obtaining header information, RSE is the memory access cycle used in consulting the switching database and T is the memory turn around latency time between read and write memory access cycles.

[0086] Further enhancements in utilizing the available bandwidth B include the storage of the switching database in both shared memory blocks 404 to eliminate contention in accessing the routing information. Time slots 204: No. 49 and No. 127 of Table 1 show parallel switching database updates labeled WSE. The parallel updates also eliminate inconsistencies between the shared memory blocks 404. The invention is not limited to parallel switching database updates, although preferred, the updates can be made in sequence as long as PDU classifier 110/610 RSE (No. 38 odd and No. 116 even) memory access cycles are not performed in between.

[0087] The above-described memory access scheme eliminates the necessity of synchronizing memory access cycles otherwise necessitating synchronization hardware and incurring synchronization overheads.

[0088] The invention is not limited to supporting fixed length memory access cycles.

[0089] The duration of read memory access cycles and write memory access cycles is independent of each other and not necessarily limited to the length of a time slot 204. Similarly time slots 204 need not correspond to single clock cycles. For some applications, the memory access cycle may be decoupled from the data bus clock. One time slot 204 can also represent multiple memory access cycles.

[0090] As shown in Table 1, the read memory access cycle SE(#) for the PDU classifier 110/610, uses three consecutive time slots 204 in determining a source data network node identifier, a destination data network node identifier, and PDU treatment information used in processing the PDU. Similar provisions can be made for other types of memory access cycles if the bandwidth requirement for that particular memory access cycle and the required the memory access cycle implementation are known.

[0091]FIG. 9 is a schematic diagram showing a data switching node having a plurality of data bus connected devices in accordance with yet another exemplary implementation of the preferred embodiment of the invention.

[0092] The invention is not limited to the design shown, additional devices such as supervisory processors 906, statistics gathering devices 902 monitoring data traffic, a console manager 904, etc. may be accommodated to access data bus connected devices via bandwidth partitioning such as is exemplary shown at WMG/RMG in Table 1.

[0093] In order to reduce electrical noise in the data bus lines 406, DMA devices 420/620 can be grouped via multiplexer/demultiplexer intermediaries 950. The memory access schedule would have to sequenced such that access collisions do not occur within each group of DMA devices.

[0094] The embodiments presented are exemplary only and persons skilled in the art would appreciate that variations to the above described embodiments may be made without departing from the spirit of the invention. The scope of the invention is solely defined by the appended claims. 

We claim:
 1. A data network node processing data comprising: a. a divided aggregate data bus having an aggregate data bus width W; b. a divided shared memory store accessed via the aggregate data bus; c. a plurality of data bus connected devices; and d. a deterministic data bus arbitration schedule apportioning an aggregate bandwidth B of the aggregate data bus to the plurality of data bus connected devices whereby data processing latencies in arbitrating the access to the aggregate data bus for the plurality of data bus connected devices are reduced.
 2. A data network node as claimed in claim 1, wherein the divided aggregate data bus further comprises at least two constituent data buses, each one of the constituent data buses having a data bus width w whereby the aggregate data bus retains the complexity of each individual constituent data bus while providing increased bandwidth B for processing data at the data network node without increasing an associated data bus clock frequency F.
 3. A data network node as claimed in claim 2, wherein the divided shared memory store comprises at least two shared memory blocks, each shared memory block being accessible via a single corresponding constituent data bus.
 4. A data network node as claimed in claim 1, wherein the plurality of data bus connected devices includes at least one data port enabling the conveyance of data between the data network node and an associated data transport network.
 5. A data network node as claimed in claim 4, wherein the at least one data port comprises an input data port conveying incoming data to the data network node.
 6. A data network node as claimed in claim 4, wherein the at least one data port comprises an output port conveying outgoing data from the data network node.
 7. A data network node as claimed in claim 4, wherein the data network node comprises a data switching node forwarding data between a plurality of data ports.
 8. A data network node as claimed in claim 7, wherein the shared memory store further comprises a switching database specifying routing information used in forwarding data between the plurality of data ports.
 9. A data network node as claimed in claim 8, wherein the switching database is stored in each one of a plurality of shared memory blocks of the divided shared memory store whereby contention of access to the switching database is prevented.
 10. A data network node as claimed in claim 7, wherein the data has a structure defined by data transport protocols which specifies the encapsulation of conveyed data into Protocol Data Unit (PDUs).
 11. A data network node as claimed in claim 10, wherein the plurality of data bus connected devices includes at least one PDU classifier determining at least one data port for each PDU to be forwarded to.
 12. A data network node as claimed in claim 1, wherein the data network node comprises a group of selectable pre-set data bus arbitration schedules enabling different bandwidth apportionments associated with the plurality of data bus connected devices.
 13. A data network node as claimed in claim 1, wherein the plurality of data bus connected devices includes at least one data flow statistics generator monitoring data processed at the data network node.
 14. A data network node as claimed in claim 1, wherein the plurality of data bus connected devices includes at least one supervisory processor running an associated protocol monitoring data processed at the data network node to update the deterministic data bus arbitration schedule in efficiently apportioning the aggregate bandwidth B between the plurality of data bus connected devices.
 15. A data network node as claimed in claim 1, wherein the data network node comprises an associated management console for managing the operation of the data network node including the specification of the deterministic data bus arbitration to be used.
 16. A data network node as claimed in claim 1, wherein a subgroup of data bus connected devices further comprises at least one Direct Memory Access (DMA) device for accessing the shared memory store.
 17. A data network node as claimed in claim 16, wherein a DMA device is used for each direction of data transmission for data port data bus connected devices, to provide simultaneous reception and transmission of data therethrough.
 18. A data network node as claimed in claim 1, wherein data bus connected devices are connected to the data bus via multiplexer/demultiplexer intermediaries to reduce noise in the data bus lines enabling a higher speed operation thereof.
 19. A data network node as claimed in claim 2, wherein each data bus connected device further comprises at least one Direct Memory Access (DMA) device associated with each constituent data bus.
 20. A data network node as claimed in claim 3, wherein the deterministic data bus arbitration schedule further comprises read memory access cycles grouped sequentially and write memory access cycles grouped sequentially whereby a number of changes between read memory access cycles and write memory access cycles is reduced thereby reducing associated memory access latencies incurred in changing therebetween.
 21. A data network node as claimed in claim 1, wherein the deterministic data bus access schedule further comprises a time line for each constituent data bus.
 22. A data network node as claimed in claim 1, wherein the deterministic data bus arbitration schedule comprises a time frame for each one of the constituent data buses, each time frame being cyclically paced through in coordinating access to the corresponding constituent data bus whereby the design of the data network node is simplified by using the deterministic arbitration schedule.
 23. A data network node as claimed in claim 22, wherein the data network node further comprises at least one arbiter coordinating access to the aggregate data bus in accordance with the deterministic data bus access schedule.
 24. A data network node as claimed in claim 22, wherein the data network node further comprises an arbiter for each constituent data bus, each arbiter coordinating access to a corresponding constituent data bus in accordance with the deterministic data bus access schedule.
 25. A data network node as claimed in claim 22, wherein each time frame further comprises a plurality of time slots, the bandwidth B of the aggregate data bus being apportioned, in terms of the time slots, to the plurality of data bus connected devices, whereby the granularity of the bandwidth apportionment is controlled by the number of time slots per time frame.
 26. A data network node as claimed in claim 25, wherein at least one time slot specifies the extent of a memory access cycle.
 27. A data network node as claimed in claim 25, wherein at least one time slot corresponds to a plurality of memory access cycles.
 28. A method of arbitrating access to a divided aggregate data bus for a plurality of data bus connected devices, the method comprising the steps of: a. dividing a stream of data conveyed via a data bus connected device into an aggregate stream of data granules; b. coordinating the access to the aggregate data bus according to a deterministic data bus arbitration schedule; and c. conveying the aggregate stream of data granules over the aggregate data bus in accordance with the deterministic data bus arbitration schedule whereby the use of the deterministic data bus arbitration schedule reduces processing overheads associated with the arbitration of access to the aggregate data bus.
 29. A method as claimed in claim 28, wherein the aggregate data bus is divided into at least two constituent data buses and the step of dividing the data stream into the aggregate stream of granules further includes a step of: further dividing the aggregate stream of data granules into at least two constituent streams of data granules, each constituent stream of data granules corresponding to a one constituent data bus.
 30. A method as claimed in claim 29, wherein the step of coordinating the access to the aggregate data bus, the method further comprises a step of: scheduling the access to a one constituent data bus for the conveyance of at least one data granule from the corresponding constituent stream of data granules.
 31. A method as claimed in claim 30, wherein the step of conveying the stream of data granules over the aggregate data bus, the method further comprises a step of: conveying data granules corresponding to the at least two constituent data buses in a sequence repeatedly cycling through the at least two constituent data buses.
 32. A method as claimed in claim 29, wherein each one of the at least two constituent data buses has a constituent data bus width w and the step of dividing the data stream into the aggregate stream of data granules the method future comprises a step of: dividing the data stream into data granules, each data granule having a size equal to the data bus width w of each one of the at least two constituent data buses.
 33. A method as claimed in claim 28, wherein a divided shared memory store is associated with the divided aggregate data bus, the divided aggregate data bus comprises at least two constituent data buses, the divided shared memory store comprises at least two shared memory blocks each of which is accessible via a one corresponding constituent data bus, the shared memory store further retrievably holding a database, the method further comprising a step of: storing the database in each one of the at least two shared memory blocks.
 34. A method as claimed in claim 33, wherein the method further comprises a step of updating each copy of the database stored in each of the at least two shared memory blocks to prevent inaccuracies in the use thereof.
 35. A method as claimed in claim 33, wherein the method further comprises a step of sequentially updating each copy of the database stored in each of the at least two shared memory blocks, the sequential update being performed between database access instances. 